Motivation

In the past few years, single-cell and spatial technologies have significantly improved our knowledge of cellular heterogeneity and architecture in health and disease, providing unprecedented insight into hematopoiesis.

A crucial and challenging step in single-cell data analysis is the annotation of cell types. An effective way to achieve an accurate cell type identification is employing marker genes, whose expression is specific to one or a few cell types. The recent increase in the number of databases collecting marker genes potentially simplifies the automatic annotation of cells into known hematopoietic populations. Nevertheless, each database contains dissimilar marker sets for the same cell type, which inevitably leads to inconsistent annotations depending on the source choice. Furthermore, the lack of a standard classification and nomenclature of cell types across the databases increases the annotation discrepancies.

Methods

We developed the Cell Marker Accordion, an R Shiny web app addressing the need for robust and reproducible identification of hematopoietic cell types in single-cell datasets. As data sources, we considered multiple published databases collecting human and mouse gene markers for hematopoietic cell types (Zhang et al., 2019; Franzén et al., 2019; Paisley & Liu, 2021; Börner et al., 2021; Liberzon et al., 2015; Hao et al., Cell 2021; Domínguez Conde et al., 2022) and standard collections of widely used cell sorting markers (Abcam; Thermo Fisher Scientific).

Annotations were standardized by mapping initial cell types to the Cell Ontology, which is becoming a reference ontology for cell types and states (Sutherland et al., 2021). Next, databases were integrated obtaining a comprehensive set of 7514 marker genes associated with 126 standardized hematopoietic cell types.

Results

The Cell Marker Accordion web interface permits to easily retrieve lists of marker genes associated with input cell types and also viceversa, starting from a list of candidate genes to obtain the matching cell types. Hierarchies of hematopoietic cell types can be easily browsed following the Cell Ontology structure in order to obtain the desired level of resolution in the markers. Importantly, marker genes can be ranked and selected by specificity, indicating whether a gene is a marker for different cell types, and also by their evidence consistency scores, measuring the agreement of different annotation sources.

To show the potential of the Cell Marker Accordion, we automatically annotated published human bone marrow single cell datasets. By exploring the expression level of well established marker genes, we confirmed an appropriate identification of bone marrow cell types obtained with the Accordion.

We further validated our approach with a dataset obtained from flow cytometry sorted blood cell populations, separately profiled with single-cell RNA-seq (Zheng et al., 2017), and we compared cell type annotations based on the Cell Marker Accordion with annotations from single database sources. Notably, we significantly improved in accuracy, increasing by more than 30% the number of cells correctly annotated with respect to any of the single databases.

The Cell Marker Accordion is indeed a user-friendly and flexible tool that can be exploited to improve the annotation of hematopoietic populations in single-cell and, potentially, spatially resolved datasets focused on the study of hematogical pathologies.

Halene:Forma Therapeutics: Consultancy.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution